Artificial Intelligence in Medicine — Latest Matching Preprints

1

Deep continual multitask severity assessment from changing clinical features

Ferri, P.; Saez, C.; Felix-De Castro, A.; Sanchez-Cuesta, P.; Garcia-Gomez, J. M.

2024-02-22 emergency medicine 10.1101/2024.02.20.24303094 medRxiv

Top 0.1%

55.8%

Show abstract

When developing Machine Learning models to support emergency medical triage, it is important to consider how changes over time in the data can negatively affect the models performance. The objective of this study was to assess the effectiveness of novel Deep Continual Learning pipelines in maximizing model performance when input features are subject to change over time, including the emergence of new features and the disappearance of existing ones. The model is designed to identify life-threatening situations, predict its admissible response delay, and determine its institutional jurisdiction. We analyzed a total of 1 414 575 events spanning from 2009 to 2019. Our findings demonstrate important performance improvements, up to 4.9% in life-threatening, 18.5% in response delay and 1.7% in jurisdiction, in absolute F1-score, compared to the current triage protocol, and improvements up to 4.4% in life-threatening and 11% in response delay, in absolute F1-score, respect to non-continual approaches.

2

AI for Mortality Prediction from Head Trauma Narratives

Pham, T. D.; Marks, K.; Hughes, D.; Chatzopoulou, D.; Coulthard, P.; Holmes, S.

2025-02-21 emergency medicine 10.1101/2025.02.20.25322619 medRxiv

Top 0.1%

35.5%

Show abstract

Head injuries are a leading global cause of mortality and disability, highlighting the critical need for advanced prognostic tools to inform clinical decision-making and optimize healthcare resource utilization. For the first time, this study introduces a cutting-edge artificial intelligence (AI) framework designed to predict mortality outcomes from head injury narratives. Leveraging deep learning-based natural language processing techniques, the framework identifies and extracts key features from unstructured text describing injury mechanisms and patient conditions to train predictive models. Validation was conducted on a diverse dataset of 1,500 head injury cases using a stratified holdout approach, with 90% allocated for training and 10% for testing. The one-dimensional convolutional neural network model demonstrated strong performance, achieving averagely 85% accuracy, 74% correct mortality prediction, 88% correct survival prediction, and an impressive area under the receiver operating characteristic curve of 0.91. This work highlights the transformative potential of AI in harnessing narrative clinical data to enhance prognostic accuracy, paving the way for more effective, evidence-based management of head injury patients.

3

Comparative Analysis of a Large Language Model and Machine Learning Method for Prediction of Hospitalization from Nurse Triage Notes: Implications for Machine Learning-based Resource Management

Patel, D.; Timsina, P.; Gorenstein, L.; Glicksberg, B. S.; Raut, G.; Cheetirala, S.; Santana, F.; Tamegue, J.; Kia, A.; zimlichman, E.; Levin, M.; Freeman, R.; Klang, E.

2023-08-10 emergency medicine 10.1101/2023.08.07.23293699 medRxiv

Top 0.1%

26.0%

Show abstract

Predicting hospitalization from nurse triage notes has significant implications in health informatics. To this end, we compared the performance of the deep-learning transformer-based model, bio-clinical-BERT, with a bag-of-words logistic regression model incorporating term frequency-inverse document frequency (BOW-LR-tf-idf). A retrospective analysis was conducted using data from 1,391,988 Emergency Department patients at the Mount Sinai Health System spanning 2017-2022. The models were trained on four hospitals data and externally validated on a fifth. Bio-clinical-BERT achieved higher AUCs (0.82, 0.84, and 0.85) compared to BOW-LR-tf-idf (0.81, 0.83, and 0.84) across training sets of 10,000, 100,000, and [~]1,000,000 patients respectively. Notably, both models proved effective at utilizing triage notes for prediction, despite the modest performance gap. Importantly, our findings suggest that simpler machine learning models like BOW-LR-tf-idf could serve adequately in resource-limited settings. Given the potential implications for patient care and hospital resource management, further exploration of alternative models and techniques is warranted to enhance predictive performance in this critical domain.

4

Development and Evaluation of a Digital Scribe: Conversation Summarization Pipeline for Emergency Department Counseling Sessions towards Reducing Documentation Burden

Sezgin, E.; Sirrianni, J.; Kranz, K.

2023-12-07 emergency medicine 10.1101/2023.12.06.23299573 medRxiv

Top 0.1%

24.9%

Show abstract

ObjectiveWe present a proof-of-concept digital scribe system as an ED clinical conversation summarization pipeline and report its performance. Materials and MethodsWe use four pre-trained large language models to establish the digital scribe system: T5-small, T5-base, PEGASUS-PubMed, and BART-Large-CNN via zero-shot and fine-tuning approaches. Our dataset includes 100 referral conversations among ED clinicians and medical records. We report the ROUGE-1, ROUGE-2, and ROUGE-L to compare model performance. In addition, we annotated transcriptions to assess the quality of generated summaries. ResultsThe fine-tuned BART-Large-CNN model demonstrates greater performance in summarization tasks with the highest ROUGE scores (F1ROUGE-1=0.49, F1ROUGE-2=0.23, F1ROUGE-L=0.35) scores. In contrast, PEGASUS-PubMed lags notably (F1ROUGE-1=0.28, F1ROUGE-2=0.11, F1ROUGE-L=0.22). BART-Large-CNNs performance decreases by more than 50% with the zero-shot approach. Annotations show that BART-Large-CNN performs 71.4% recall in identifying key information and a 67.7% accuracy rate. DiscussionThe BART-Large-CNN model demonstrates a high level of understanding of clinical dialogue structure, indicated by its performance with and without fine-tuning. Despite some instances of high recall, there is variability in the models performance, particularly in achieving consistent correctness, suggesting room for refinement. The models recall ability varies across different information categories. ConclusionThe study provides evidence towards the potential of AI-assisted tools in reducing clinical documentation burden. Future work is suggested on expanding the research scope with larger language models, and comparative analysis to measure documentation efforts and time.

5

Diagnostic surveillance of high-grade gliomas: towards automated change detection using radiology report classification

Di Noto, T.; Atat, C.; Teiga, E. G.; Hegi, M.; Hottinger, A.; Cuadra, M. B.; Hagmann, P.; Richiardi, J.

2021-09-27 oncology 10.1101/2021.09.24.21264002 medRxiv

Top 0.1%

22.5%

Show abstract

Natural Language Processing (NLP) on electronic health records (EHRs) can be used to monitor the evolution of pathologies over time to facilitate diagnosis and improve decision-making. In this study, we designed an NLP pipeline to classify Magnetic Resonance Imaging (MRI) radiology reports of patients with high-grade gliomas. Specifically, we aimed to distinguish reports indicating changes in tumors between one examination and the follow-up examination (treatment response/tumor progression versus stability). A total of 164 patients with 361 associated reports were retrieved from routine imaging, and reports were labeled by one radiologist. First, we assessed which embedding is more suitable when working with limited data, in French, from a specific domain. To do so, we compared a classic embedding techniques, TF-IDF, to a neural embedding technique, Doc2Vec, after hyperparameter optimization for both. A random forest classifier was used to classify the reports into stable (unchanged tumor) or unstable (changed tumor). Second, we applied the post-hoc LIME explainability tool to understand the decisions taken by the model. Overall, classification results obtained in repeated 5-fold cross-validation with TF-IDF reached around 89% AUC and were significantly better than those achieved with Doc2Vec (Wilcoxon signed-rank test, P = 0.009). The explainability toolkit run on TF-IDF revealed some interesting patterns: first, words indicating change such as progression were rightfully frequent for reports classified as unstable; similarly, words indicating no change such as not were frequent for reports classified as stable. Lastly, the toolkit discovered misleading words such as T2 which are clearly not directly relevant for the task. All the code used for this study is made available.

6

Using a self-attention architecture to automate valence categorization of French teenagers' free descriptions of their family relationships. A proof of concept.

Sedki, M.; Vidal, N.; Roux, P.; Barry, C.; Speranza, M.; Falissard, B.; Brunet-Gouet, E.

2023-01-18 health informatics 10.1101/2023.01.16.23284557 medRxiv

Top 0.1%

21.6%

Show abstract

This paper proposes a proof of concept of using natural language processing techniques to categorize valence of family relationships described in free texts written by french teenagers. The proposed study traces the evolution of techniques for word embedding. After decomposing the different texts in our possession into short texts composed of sentences and manual labeling, we tested different word embedding scenarios to train a multi-label classification model where a text can take several labels : labels describing the family link between the teenager and the person mentioned in the text and labels describing the teenagers relationship with them positive/negative/neutral valence). The natural baseline for word vector representation of our texts is to build a TF-IDF and train classical classifiers (Elasticnet logistic regression, gradient boosting, random forest, support vector classifier) after selecting a model by cross validation in each class of machine learning models. We then studied the strengths of word-vectors embeddings by an advanced language representation technique via the CamemBERT transformer model, and, again, used them with classical classifiers to compare their respective performances. The last scenario consisted in augmenting the CamemBERT with output dense layers (perceptron) representing a classifier adapted to the multi-label classification and fine-tuning the CamemBERT original layers. The optimal fine-tuning depth that achieves a bias-variance trade-off was obtained by a cross-validation procedure. The results of the comparison of the three scenarios on a test dataset show a clear improvement of the classification performances of the scenario with fine-tuning beyond the baseline and of a simple vectorization using CamemBERT without fine-tuning. Despite the moderate size of the dataset and the input texts, fine-tuning to an optimal depth remains the best solution to build a classifier.

7

Benchmarking Vision Encoders For Survival Analysis Using Histopathological Images

Nizami, A.; Halder, A.

2024-08-23 oncology 10.1101/2024.08.23.24312362 medRxiv

Top 0.1%

18.5%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWCancer is a complex disease characterized by the uncontrolled growth of abnormal cells in the body but can be prevented and even cured when detected early. Advanced medical imaging has introduced Whole Slide Images (WSIs). When combined with deep learning techniques, it can be used to extract meaningful features. These features are useful for various tasks such as classification and segmentation. There have been numerous studies involving the use of WSIs for survival analysis. Hence, it is crucial to determine their effectiveness for specific use cases. In this paper, we compared three publicly available vision encoders-UNI, Phikon and ResNet18 which are trained on millions of histopathological images, to generate feature embedding for survival analysis. WSIs cannot be fed directly to a network due to their size. We have divided them into 256 x 256 pixels patches and used a vision encoder to get feature embeddings. These embeddings were passed into an aggregator function to get representation at the WSI level which was then passed to a Long Short Term Memory (LSTM) based risk prediction head for survival analysis. Using breast cancer data from The Cancer Genome Atlas Program (TCGA) and k-fold cross-validation, we demonstrated that transformer-based models are more effective in survival analysis and achieved better C-index on average than ResNet-based architecture. The code1 for this study will be made available.

8

Multimodal data fusion of adult and pediatric brain tumors with deep learning

Steyaert, S.; Qiu, Y. L.; Zheng, Y.; Mukherjee, P.; Vogel, H.; Gevaert, O.

2022-09-27 oncology 10.1101/2022.09.21.22280223 medRxiv

Top 0.1%

18.3%

Show abstract

The introduction of deep learning in both imaging and genomics has significantly advanced the analysis of biomedical data. For complex diseases such as cancer different data modalities may reveal different disease characteristics, and the integration of imaging with genomic data has the potential to unravel additional information then when using these data sources in isolation. Here, we propose a DL framework that by combining histopathology images with gene expression profiles can predict prognosis of brain tumors. Using two separate cohorts of 783 adult and 305 pediatric brain tumors, the developed multimodal data models achieved better prediction results compared to the single data models, but also leads to the identification of more relevant biological pathways. Importantly, when testing our adult models on a third independent brain tumor dataset, we show our multimodal framework is able to generalize and performs better on new data from different cohorts. Furthermore, leveraging the concept of transfer learning, we demonstrate how our multimodal models pre-trained on pediatric glioma can be used to predict prognosis for two more rare (less available samples) pediatric brain tumors, i.e. ependymoma and medulloblastoma. To summarize, our study illustrates that a multimodal data fusion approach can be successfully implemented and customized to model clinical outcome of adult and pediatric brain tumors.

9

A Compressed Large Language Model Embedding Dataset of ICD 10 CM Descriptions

Kane, M.; Esserman, D.; Latham, N.; Greene, E.; Ganz, D.

2023-04-26 health informatics 10.1101/2023.04.24.23289046 medRxiv

Top 0.1%

17.7%

Show abstract

This paper presents novel datasets providing numerical representations of ICD-10-CM codes by generating description embeddings using a large language model followed by a dimension reduction via autoencoder. The embeddings serve as informative input features for machine learning models by capturing relationships among categories and preserving inherent context information. The model generating the data was validated in two ways. First, the dimension reduction was validated using an autoencoder, and secondly, a supervised model was created to estimate the ICD-10-CM hierarchical categories. Results show that the dimension of the data can be reduced to as few as 10 dimensions while maintaining the ability to reproduce the original embeddings, with the fidelity decreasing as the reduced-dimension representation decreases. Multiple compression levels are provided, allowing users to choose as per their requirements. The readily available datasets of ICD-10-CM codes are anticipated to be highly valuable for researchers in biomedical informatics, enabling more advanced analyses in the field. This approach has the potential to significantly improve the utility of ICD-10-CM codes in the biomedical domain.

10

Integrating multi-OMICS data through sparse Canonical Correlation Analysis for predicting complex traits: A comparative study

Rodosthenous, T.; Evangelou, M.; Shahrezaei, V.

2019-11-15 genomics 10.1101/843524 medRxiv

Top 0.1%

15.3%

Show abstract

MotivationRecent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that by integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. As OMICS datasets are heterogeneous and high-dimensional (p >> n) integrating them can be done through Sparse Canonical Correlation Analysis (sCCA) that penalises the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sCCA have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets. ResultsThrough a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al. [2009], penalised matrix decomposition CCA proposed by Witten and Tibshirani [2009] and its extension proposed by Suo et al. [2017]. The aferomentioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement over conventional predictive models that include one or multiple datasets. Contacttr1915@ic.ac.uk

11

Combining Clinician Expertise with Prompt Engineering enhances Small Language Models Reliability for Cancer Entity Recognition in Electronic Health Records

Corso, F.; Peppoloni, V.; Mazzeo, L.; Leone, G.; Passos, L.; Miskovic, V.; Armanini, J.; Ferrarin, A.; Wiest, I. C.; Wolf, F.; Montelatici, G.; Romano', R.; Ambrosini, P.; Capoccia, T.; Natangelo, S.; Rota, S.; Andena, P.; De Ponti, M.; Russo, A.; Stasi, G.; Provenzano, L.; Spagnoletti, A.; Meazza Prina, M.; Cavalli, C.; Giani, C.; Serino, R.; Borraccino, M.; Bonalume, C.; Di Mauro, R. M.; Agosta, C.; Dumitrascu, A. D.; Di Liberti, G.; Corrao, G.; Beninato, T.; Ganzinelli, M.; Occhipinti, M.; Brambilla, M.; Proto, C.; Kather, J. N.; Pedrocchi, A. L. G.; De Braud, F.; Lo Russo, G.; Baili, P.; P

2025-10-21 oncology 10.1101/2025.10.16.25337917 medRxiv

Top 0.1%

12.9%

Show abstract

Real-world data (RWD), largely stored in unstructured electronic health records (EHRs), are critical for understanding complex diseases like cancer. However, extracting structured information from these narratives is challenging due to linguistic variability, semantic complexity, and privacy concerns. This study evaluates the performance of four locally deployable and small language models (SLMs), LLaMA, Mistral, BioMistral, and MedLLaMA, for information extraction (IE) from Italian EHRs within the APOLLO 11 trial on non-small cell lung cancer (NSCLC). We examined three prompting strategies (zero-shot, few-shot, and annotated few-shot) across English and Italian, involving clinicians with varying expertise to assess prompt designs impact on accuracy. Results show that general-purpose models (e.g., LLaMA 3.1 8B) outperform biomedical models in most tasks, particularly in extracting binary features. Multiclass variables such as TNM staging, PD-L1, and ECOG were more difficult due to implicit language and lack of standardization. Few-shot prompting and native-language inputs significantly improved performance and reduced hallucinations. Clinical expertise enhanced consistency in annotation, particularly among students using annotated examples. The study confirms that privacy-preserving SLMs can be deployed locally for efficient and secure cancer data extraction. Findings highlight the need for hybrid systems combining SLMs with expert input and underline the importance of aligning clinical documentation practices with SLM capabilities. This is the first study to benchmark SLMs on Italian EHRs and investigate the role of clinical expertise in prompt engineering, offering valuable insights for the future integration of SLMs into real-world clinical workflows.

12

Pattern-centric transformation of omics-data sources grounded on multi-wise gene associations aids predictive tasks in TCGA while ensuring interpretability.

Patricio, A.; Costa, R. S.; Henriques, R.

2023-05-30 genomics 10.1101/2023.05.28.542574 medRxiv

Top 0.1%

12.9%

Show abstract

MotivationThe increasing prevalence of omics data sources is pushing the study of regulatory mechanisms underlying complex diseases such as cancer. However, the vast quantities of features produced and the inherent interplay between them lead to a level of complexity that hampers both descriptive and predictive tasks, requiring custom-built algorithms that can extract relevant information from these sources of data. ResultsWe propose a transformation that moves data centered on molecules (e.g. transcripts and proteins) to a new data space focused on putative regulatory modules given by statistically relevant patterns of coexpression. The proposed transformation extracts patterns from the data through biclustering and uses them to create new variables with guarantees of interpretability and discriminative power. The transformation is shown to achieve dimensionality reductions of up to 99% and to increase the predictive performance of various classifiers across multiple omics layers. Our results suggest that a transformation of omics data from gene-centric to pattern-centric data provides benefits to both prediction tasks and human interpretation. The proposed approach is expected to greatly support further bioinformatic analyses for precision medicine applications. AvailabilitySoftware code and the raw results generated are available at github.com/Andrempp/Pattern-Centric-Transformation. Contactandremppatricio@tecnico.ulisboa.pt Supplementary informationSupplementary data are available at Journal Name online.

13

Optimized BERT-based NLP outperforms Zero-Shot Methods for Automated Symptom Detection in Clinical Practice

Diaz Ochoa, J. G.; Layer, N.; Mahr, J.; Mustafa, F. E.; Menzel, C. U.; Mueller-Schilling, M.; Schilling, T.; Illerhaus, G.; Knott, M.; Krohn, A.

2025-04-22 health informatics 10.1101/2025.04.21.25326037 medRxiv

Top 0.1%

12.8%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWO_ST_ABSBO_SCPLOWACKGROUNDC_SCPLOWC_ST_ABSLarge Language Nodels (LLMs) have raised broad expectations for clinical use, particularly in the processing of complex medical narratives. However, in practice, more targeted Natural Language Processing (NLP) approaches may offer higher precision and feasibility for symptom extraction from real-world clinical texts. NLP provides promising tools for extracting clinical information from unstructured medical narratives. However, few studies have focused on integrating symptom information from free texts in German, particularly for complex patient groups such as emergency department (ED) patients. The ED setting presents specific challenges: high documentation pressure, heterogeneous language styles, and the need for secure, locally deployable models due to strict data protection regulations. Furthermore, German remains a low-resource language in clinical NLP. MO_SCPLOWETHODSC_SCPLOWWe implemented and compared two models for zero-shot learning--GLiNER and Mistral--and a fine-tuned BERT-based SCAI-BIO/BioGottBERT model for named entity recognition (NER) of symptoms, anatomical terms, and negations in German ED anamnesis texts in an on-premises environment in a hospital. Manual annotations of 150 narratives were used for model validation. The postprocessing steps included confidence-based filtering, negation exclusion, symptom standardization, and integration with structured oncology registry data. All computations were performed on local hospital servers in an on-premises implementation to ensure full data protection compliance. RO_SCPLOWESULTSC_SCPLOWThe fine-tuned SCAI-BIO/BioGottBERT model outperformed both zero-shot approaches, achieving an F1 score of 0.84 for symptom extraction and demonstrating superior performance in negation detection. The validated pipeline enabled systematic extraction of affirmed symptoms from ED-free text, transforming them into structured data. This method allows large-scale analysis of symptom profiles across patient populations and serves as a technical foundation for symptom-based clustering and subgroup analysis. CO_SCPLOWONCLUSIONSC_SCPLOWOur study demonstrates that modern NLP methods can reliably extract clinical symptoms from German ED free text, even under strict data protection constraints and with limited training resources. Fine-tuned models offer a precise and practical solution for integrating unstructured narratives into clinical decision-making. This work lays the methodological foundation for a new way of systematically analyzing large patient cohorts on the basis of free-text data. Beyond symptoms, this approach can be extended to extracting diagnoses, procedures, or other clinically relevant entities. Building upon this framework, we apply network-based clustering methods (in a subsequent study) to identify clinically meaningful patient subgroups and explore sex- and age-specific patterns in symptom expression.

14

DS4DH at MEDIQA-Chat 2023: Leveraging SVM and GPT-3 Prompt Engineering for Medical Dialogue Classification and Summarization

Zhang, B.; Mishra, R.; Teodoro, D.

2023-06-12 health informatics 10.1101/2023.06.08.23291121 medRxiv

Top 0.1%

12.7%

Show abstract

This paper presents the results of the Data Science for Digital Health (DS4DH) group in the MEDIQA-Chat Tasks at ACL-ClinicalNLP 2023. Our study combines the power of a classical machine learning method, Support Vector Machine, for classifying medical dialogues, along with the implementation of oneshot prompts using GPT-3.5. We employ dialogues and summaries from the same category as prompts to generate summaries for novel dialogues. Our findings exceed the average benchmark score, offering a robust reference for assessing performance in this field.

15

Advancing oncology with federated learning: transcending boundaries in breast, lung, and prostate cancer. A systematic review

Ankolekar, A.; Boie, S.; Abdollahyan, M.; Gadaleta, E.; Hasheminasab, S. A.; Yang, G.; Beauville, C.; Dikaios, N.; Kastis, G. A.; Bussmann, M.; Khalid, S.; Kruger, H.; Lambin, P.; Papanastasiou, G.

2024-08-09 oncology 10.1101/2024.08.08.24311681 medRxiv

Top 0.1%

12.6%

Show abstract

Federated Learning (FL) has emerged as a promising solution to address the limitations of centralised machine learning (ML) in oncology, particularly in overcoming privacy concerns and harnessing the power of diverse, multi-center data. This systematic review synthesises current knowledge on the state-of-the-art FL in oncology, focusing on breast, lung, and prostate cancer. Distinct from previous surveys, our comprehensive review critically evaluates the real-world implementation and impact of FL on cancer care, demonstrating its effectiveness in enhancing ML generalisability, performance and data privacy in clinical settings and data. We evaluated state-of-the-art advances in FL, demonstrating its growing adoption amid tightening data privacy regulations. FL outperformed centralised ML in 15 out of the 25 studies reviewed, spanning diverse ML models and clinical applications, and facilitating integration of multi-modal information for precision medicine. Despite the current challenges identified in reproducibility, standardisation and methodology across studies, the demonstrable benefits of FL in harnessing real-world data and addressing clinical needs highlight its significant potential for advancing cancer research. We propose that future research should focus on addressing these limitations and investigating further advanced FL methods, to fully harness data diversity and realise the transformative power of cutting-edge FL in cancer care.

16

Decision trees for COVID-19 prognosis learned from patient data: Desaturating the ER with Artificial Intelligence

Bernaola, N.; De Lima, G.; Riano, M.; Llanos, L.; Heili-Frades, S.; Sanchez, O.; Lara, A.; Plaza, G.; Carballo, C.; Gallego, P.; Larranaga, P.; Bielza, C.

2022-05-10 emergency medicine 10.1101/2022.05.09.22274832 medRxiv

Top 0.1%

12.1%

Show abstract

ObjectivesTo present a model that enhances the accuracy of clinicians when presented with a possibly critical Covid-19 patient. MethodsA retrospective study was performed with information of 5,745 SARS-CoV2 infected patients admitted to the Emergency room of 4 public Hospitals in Madrid belonging to Quiron Salud Health Group (QS) from March 2020 to February 2021. Demographics, clinical variables on admission, laboratory markers and therapeutic interventions were extracted from Electronic Clinical Records. Traits related to mortality were found through difference in means testing and through feature selection by learning multiple classification trees with random initialization and selecting the ones that were used the most. We validated the model through cross-validation and tested generalization with an external dataset from 4 hospitals belonging to Sanitas Hospitals Health Group. The usefulness of two different models in real cases was tested by measuring the effect of exposure to the model decision on the accuracy of medical professionals. ResultsOf the 5,745 admitted patients, 1,173 died. Of the 110 variables in the dataset, 34 were found to be related with our definition of criticality (death in <72 hours) or all-cause mortality. The models had an accuracy of 85% and a sensitivity of 50% averaged through 5-fold cross validation. Similar results were found when validating with data from the 4 hospitals from Sanitas. The models were found to have 11% better accuracy than doctors at classifying critical cases and improved accuracy of doctors by 12% for non-critical patients, reducing the cost of mistakes made by 17%.

17

Domain-specific LLM Development and Evaluation -- A Case-study for Prostate Cancer

Tariq, A.; Urooj, A.; Das, A.; Jeong, J.; Trivedi, S.; Patel, B.; Banerjee, I.

2024-03-16 oncology 10.1101/2024.03.15.24304362 medRxiv

Top 0.1%

12.1%

Show abstract

In this work, we present our strategy for developing domain-specific large language models which cover the vocabulary of the target domain and train on reliable sources of clinical information. Prostate cancer was chosen as a use-case for this study. We collected more than 1.8 million clinical notes and radiology and pathology reports for 15341 patients treated for prostate cancer in Mayo Clinic across three sites and outpatient clinics. In addition to domain-specific training data, we built domain-specific tokenizers and devised knowledge-guided training strategies for LLM development. During the self-supervised training, LLM was forced to predict domain-specific information by marking clinical terms using UMLS parser. We evaluated the model for downstream tasks of clinical information prediction and question answering using quantitative and user evaluation study to measure the accuracy, reliability and information completeness. We compared the domain-specific model against similarly sized general purpose model GPT-2 and a three-times larger domain specialized model. i.e., BioGPT. Our model outperformed GPT-2 on both tasks by a wide margin. Our model was also able to outperform BioGPT on clinical information prediction tasks and showed some advantages over BioGPT in question-answering tasks.

18

A multimodal cross-attention pathotranscriptome integration for enhanced survival prediction of oral squamous cell carcinoma

Dwivedi, K.; Mahbod, A.; Ecker, R. C.; Janjic, K.

2025-11-03 oncology 10.1101/2025.10.31.25339218 medRxiv

Top 0.1%

12.1%

Show abstract

Oral squamous cell carcinoma (OSCC) accounts for a major part of cancer mortality, with survival outcomes highly dependent on early diagnosis. While many approaches have been proposed for OSCC survival prediction, they often rely on unimodal data, which may be suboptimal. In this study, we introduced a unified cross-attention-based deep learning framework that integrates whole-slide histopathology images (WSIs) and transcriptomic data from OSCC patients for survival prediction. The framework employed an autoencoder for transcriptomic feature extraction and a state-of-the-art pathology foundation model--evaluated across five alternatives--to derive WSI embeddings. These embeddings were subsequently integrated using cross-attention and concatenation within a Cox proportional hazards model. The multimodal approach outperformed nearly all unimodal counterparts, achieving a maximum concordance index of 0.780{+/-}0.059 with cross-attention and 0.766{+/-}0.050 with concatenation. The results indicate that pathotranscriptomic integration could improve survival prediction for OSCC patients. The implementation is available on GitHub at: https://github.com/kountaydwivedi/multimodal fusion.git. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

19

Patient friendly summaries of oncology consultations generated by large language models - A pilot study of patient and provider satisfaction

Harchandani, S.; Quinn, R.; Mittal, K.; Martin, A.; Wang, M.-J.; Holstead, R. G.

2025-10-15 oncology 10.1101/2025.10.13.25337951 medRxiv

Top 0.1%

11.9%

Show abstract

The expanding capacity of large language models allow for improvements in patient and provider healthcare quality and experience. The medical oncology consultation often includes a discussion of a life-limiting diagnosis and complex treatment protocols. Patient recall from the discussion may be limited, and it is possible that a patient specific written summary could help with understanding, recall, and overall experience. Using a privacy compliant large language model, a prompt was instructed to rewrite an ambulatory medical consultation note as a patient friendly summary, capturing key details from a diagnosis and treatment plan. The summary was provided to both provider and patient for review and a 5-point Likert survey was administered inquiring on the outputs accuracy, clarity, and helpfulness. Patients reported agreement in 100%, 100%, and 87% on each topic respectively. 93% of patients recommended the use of similar summaries in the future. Providers reported agreement in 98%, 91%, and 96% for accuracy, clarity, and empathy respectively. All providers (100%) recommended similar summaries to be used in the future. Some of the summaries retained jargon and results from this study will be used to optimize the prompt for an expanded study. In conclusion, a patient-friendly summary derived from a medical note using a large language model prompt was helpful to patients, and found to be useful for providers Author SummaryAs medical oncology providers, our new patient consultation appointments often require disclosing the diagnosis of a cancer, and a discussion on prognosis, complex treatment plans, the potential for significant side effects, and a number of tests/procedures that are required prior to initiation of the care plan. Patients often benefit from friends or family who take notes during an appointment, however this is not always possible. Technological advances in natural language processing with large language models such as Chat GPT allow for translation of medical language into plain language. In this study, we used a prompt to rewrite a medical note into a summary of the patients oncologic diagnosis and care plan. We then provided this summary to patients and provider to assess their feedback on the value of these summaries. We found that both providers and patients found these summaries to be accurate and understandable. Both groups recommended further development of these summaries. We intend to optimize our summary production for future studies using findings and feedback from this project.

20

Dementia Prediction in Older People through Topic-cued Spontaneous Conversation

Rutkowski, T. M.; Abe, M. S.; Tokunaga, S.; Otake-Matsuura, M.

2021-05-19 health informatics 10.1101/2021.05.18.21257366 medRxiv

Top 0.1%

11.8%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWAn increase in dementia cases is producing significant medical and economic pressure in many communities. This growing problem calls for the application of AI-based technologies to support early diagnostics, and for subsequent non-pharmacological cognitive interventions and mental well-being monitoring. We present a practical application of a machine learning (ML) model in the domain known as AI for social good. In particular, we focus on early dementia onset prediction from speech patterns in natural conversation situations. This paper explains our model and study results of conversational speech pattern-based prognostication of mild dementia onset indicated by predictive Mini-Mental State Exam (MMSE) scores. Experiments with elderly subjects are conducted in natural conversation situations, with four members in each study group. We analyze the resulting four-party conversation speech transcripts within a natural language processing (NLP) deep learning framework to obtain conversation embedding. With a fully connected deep learning model, we use the conversation topic changing distances for subsequent MMSE score prediction. This pilot study is conducted with Japanese elderly subjects within a healthy group. The best median MMSE prediction errors are at the level of 0.167, with a median coefficient of determination equal to 0.330 and a mean absolute error of 0.909. The results presented are easily reproducible for other languages by swapping the language model in the proposed deep-learning conversation embedding approach.